12 research outputs found

    Interpreting Neural Policies with Disentangled Tree Representations

    Full text link
    The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis

    Solving Continuous Control via Q-learning

    Full text link
    While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks

    Learning to Plan via Deep Optimistic Value Exploration

    No full text
    Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of models and formulate an upper confidence bound (UCB) objective to recover optimistic estimates. Training the policy on ensemble rollouts with the learned value function as the terminal cost allows for projecting long-term interactions into a limited planning horizon, thus enabling deep optimistic exploration. We do not assume a priori knowledge of either the dynamics or reward function. We demonstrate that our approach can accommodate both dense and sparse reward signals, while improving sample complexity on a variety of benchmarking tasks. Keywords: Reinforcement Learning; Deep Exploration; Model-Based; Value Function; UCBOffice of Naval Research; Qualcomm; Toyota Research Institut

    Locomotion Planning through a Hybrid Bayesian Trajectory Optimization

    No full text
    Locomotion planning for legged systems requires reasoning about suitable contact schedules. The contact se- quence and timings constitute a hybrid dynamical system and prescribe a subset of achievable motions. State-of-the- art approaches cast motion planning as an optimal control problem. In order to decrease computational complexity, one common strategy separates footstep planning from motion optimization and plans contacts using heuristics. In this paper, we propose to learn contact schedule selection from high- level task descriptors using Bayesian Optimization. A bi-level optimization is defined in which a Gaussian Process model predicts the performance of trajectories generated by a motion planning nonlinear program. The agent, therefore, retains the ability to reason about suitable contact schedules, while explicit computation of the corresponding gradients is avoided. We delineate the algorithm in its general form and provide results for planning single-legged hopping. Our method is capable of learning contact schedule transitions that align with human intuition. It performs competitively against a heuristic baseline in predicting task appropriate contact schedules

    Inclusion of Angular Momentum During Planning for Capture Point Based Walking

    Get PDF
    When walking at high speeds, the swing legs of robots produce a non-negligible angular momentum rate. To accommodate this, we provide a reference trajectory generator for bipedal walking that incorporates predicted centroidal angular momentum at the planning stage. This can be done efficiently as the Centroidal Moment Pivot (CMP), Instantaneous Capture Point (ICP) and the center of mass (CoM) all have closedform trajectory solutions due to their linear dynamics. This is then used to produce smooth, continuous trajectories. We furthermore provide a lightweight model to estimate angular momentum as induced during leg swing of the gait cycle. Our proposed trajectory generator is tested thoroughly in simulation and has been shown to successfully operate on the real hardware

    Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

    Full text link
    Experience replay plays a crucial role in improving the sample efficiency of deep reinforcement learning agents. Recent advances in experience replay propose using Mixup (Zhang et al., 2018) to further improve sample efficiency via synthetic sample generation. We build upon this technique with Neighborhood Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that interpolates transitions with their closest neighbors in state-action space. NMER preserves a locally linear approximation of the transition manifold by only applying Mixup between transitions with vicinal state-action features. Under NMER, a given transition's set of state action neighbors is dynamic and episode agnostic, in turn encouraging greater policy generalizability via inter-episode interpolation. We combine our approach with recent off-policy deep reinforcement learning algorithms and evaluate on continuous control environments. We observe that NMER improves sample efficiency by an average 94% (TD3) and 29% (SAC) over baseline replay buffers, enabling agents to effectively recombine previous experiences and learn from limited data.Comment: Accepted to L4DC 202

    Inclusion of Angular Momentum During Planning for Capture Point Based Walking

    No full text
    When walking at high speeds, the swing legs of robots produce a non-negligible angular momentum rate. To accommodate this, we provide a reference trajectory generator for bipedal walking that incorporates predicted centroidal angular momentum at the planning stage. This can be done efficiently as the Centroidal Moment Pivot (CMP), Instantaneous Capture Point (ICP) and the center of mass (CoM) all have closedform trajectory solutions due to their linear dynamics. This is then used to produce smooth, continuous trajectories. We furthermore provide a lightweight model to estimate angular momentum as induced during leg swing of the gait cycle. Our proposed trajectory generator is tested thoroughly in simulation and has been shown to successfully operate on the real hardware

    Good Posture, Good Balance: Comparison of bio-inspired and model-based approaches for posture control of humanoid robots

    No full text
    This article provides a theoretical and thorough experimental comparison of two distinct posture control approaches: 1) a fully model-based control approach and 2) a biologically inspired Approach derived from human observations. While the robotic approach can easily be applied to balancing in three-dimensional (3-D) and multicontact (MC) situations, the biologically inspired balancer currently only works in two-dimensional situations but shows interesting robustness properties under time delays in the feedback loop. This is an important feature when considering the signal transmission and processing properties in the human sensorimotor system. Both controllers were evaluated in a series of experiments with a torque-controlled humanoid robot (TORO). This article concludes with some suggestions for the improvement of model-based balancing approaches in robotics
    corecore